<!--
  This file is a part of the open-eBackup project.
  This Source Code Form is subject to the terms of the Mozilla Public License, v. 2.0.
  If a copy of the MPL was not distributed with this file, You can obtain one at
  http://mozilla.org/MPL/2.0/.
  
  Copyright (c) [2024] Huawei Technologies Co.,Ltd.
  
  THIS SOFTWARE IS PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OF ANY KIND,
  EITHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO NON-INFRINGEMENT,
  MERCHANTABILITY OR FIT FOR A PARTICULAR PURPOSE.
  -->

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en-us" xml:lang="en-us">
 <head>
  <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  <meta http-equiv="X-UA-Compatible" content="IE=edge">
  <meta name="DC.Type" content="topic">
  <meta name="DC.Title" content="Data Deduplication and Compression">
  <meta name="product" content="">
  <meta name="DC.Relation" scheme="URI" content="en-us_topic_0000002164825706.html">
  <meta name="prodname" content="">
  <meta name="version" content="">
  <meta name="brand" content="">
  <meta name="DC.Publisher" content="20250306">
  <meta name="prodname" content="csbs">
  <meta name="documenttype" content="usermanual">
  <meta name="DC.Format" content="XHTML">
  <meta name="DC.Identifier" content="EN-US_TOPIC_0000002200066717">
  <meta name="DC.Language" content="en-us">
  <link rel="stylesheet" type="text/css" href="public_sys-resources/commonltr.css">
  <title>Data Deduplication and Compression</title>
 </head>
 <body style="clear:both; padding-left:10px; padding-top:5px; padding-right:5px; padding-bottom:5px">
  <a name="EN-US_TOPIC_0000002200066717"></a><a name="EN-US_TOPIC_0000002200066717"></a>
  <h1 class="topictitle1">Data Deduplication and Compression</h1>
  <div>
   <p id="EN-US_TOPIC_0000002200066717__en-us_topic_0000002164820402_en-us_topic_0000001170463690_p43941312368">After production data is backed up to the <span id="EN-US_TOPIC_0000002200066717__en-us_topic_0000002164820402_text750932617416">product</span>, the <span id="EN-US_TOPIC_0000002200066717__en-us_topic_0000002164820402_text16580729155412">product</span> deduplicates and then compresses the backup data by default to eliminate redundant data and save storage space. Data deduplication and compression is enabled by default and cannot be manually enabled or disabled.</p>
   <div class="section" id="EN-US_TOPIC_0000002200066717__en-us_topic_0000002164820402_en-us_topic_0000001170463690_section77794515142">
    <h4 class="sectiontitle">Deduplication</h4>
    <p id="EN-US_TOPIC_0000002200066717__en-us_topic_0000002164820402_en-us_topic_0000001170463690_p6116101118381">The <span id="EN-US_TOPIC_0000002200066717__en-us_topic_0000002164820402_text132132710417">product</span> adopts the technologies of variable-length segmentation and double fingerprints to delete duplicate data. These technologies deliver a higher deduplication ratio and achieve lower CPU overhead of the storage system compared with traditional deduplication technologies.</p>
    <ul id="EN-US_TOPIC_0000002200066717__en-us_topic_0000002164820402_en-us_topic_0000001170463690_ul63985811510">
     <li id="EN-US_TOPIC_0000002200066717__en-us_topic_0000002164820402_en-us_topic_0000001170463690_li1531487205214">Variable-length segmentation<p id="EN-US_TOPIC_0000002200066717__en-us_topic_0000002164820402_en-us_topic_0000001170463690_p1569917135219"><a name="EN-US_TOPIC_0000002200066717__en-us_topic_0000002164820402_en-us_topic_0000001170463690_li1531487205214"></a><a name="en-us_topic_0000002164820402_en-us_topic_0000001170463690_li1531487205214"></a>By using the variable-length segmentation technology, the system calculates the characteristics of a data stream, determines the location of the segmentation point based on the characteristics, and obtains data blocks with different lengths after segmentation. Once the written data changes, only the characteristics of the data near the change location change, while the characteristics of data in other locations remain unchanged. In this case, the difference between the data characteristics before and after the modification depends only on the changed data volume. For the data that is not changed, the same variable-length segments are generated and deduplicated by the storage system, achieving a high deduplication ratio.</p></li>
     <li id="EN-US_TOPIC_0000002200066717__en-us_topic_0000002164820402_en-us_topic_0000001170463690_li1633811615210">Double fingerprints<p id="EN-US_TOPIC_0000002200066717__en-us_topic_0000002164820402_en-us_topic_0000001170463690_p16109219141114"><a name="EN-US_TOPIC_0000002200066717__en-us_topic_0000002164820402_en-us_topic_0000001170463690_li1633811615210"></a><a name="en-us_topic_0000002164820402_en-us_topic_0000001170463690_li1633811615210"></a>According to the traditional deduplication algorithm, the storage system calculates fingerprints for data blocks, reads data from disks after matching the fingerprints, and then compares data byte by byte. This method consumes excessive CPU resources.</p> <p id="EN-US_TOPIC_0000002200066717__en-us_topic_0000002164820402_en-us_topic_0000001170463690_p197775342117">The <span id="EN-US_TOPIC_0000002200066717__en-us_topic_0000002164820402_text8557427546">product</span> uses double hash algorithms to compare fingerprints twice for data blocks. It does not need to read data from disks or compare data byte by byte, saving a large number of CPU resources and improving system performance.</p></li>
    </ul>
    <p id="EN-US_TOPIC_0000002200066717__en-us_topic_0000002164820402_en-us_topic_0000001170463690_p986313819368"><a href="#EN-US_TOPIC_0000002200066717__en-us_topic_0000002164820402_en-us_topic_0000001170463690_fig431618376300">Figure 1</a> shows the data deduplication process.</p>
    <div class="fignone" id="EN-US_TOPIC_0000002200066717__en-us_topic_0000002164820402_en-us_topic_0000001170463690_fig431618376300">
     <a name="EN-US_TOPIC_0000002200066717__en-us_topic_0000002164820402_en-us_topic_0000001170463690_fig431618376300"></a><a name="en-us_topic_0000002164820402_en-us_topic_0000001170463690_fig431618376300"></a><span class="figcap"><b>Figure 1 </b>Data deduplication process</span><br><span><img class="eddx" id="EN-US_TOPIC_0000002200066717__en-us_topic_0000002164820402_en-us_topic_0000001170463690_image331633793015" src="en-us_image_0000002164660846.png"></span>
    </div>
    <ol id="EN-US_TOPIC_0000002200066717__en-us_topic_0000002164820402_en-us_topic_0000001170463690_ol1612412196184">
     <li id="EN-US_TOPIC_0000002200066717__en-us_topic_0000002164820402_en-us_topic_0000001170463690_li951712118182">The <span id="EN-US_TOPIC_0000002200066717__en-us_topic_0000002164820402_text7461928645">product</span> splits backup data into data blocks of variable lengths.</li>
     <li id="EN-US_TOPIC_0000002200066717__en-us_topic_0000002164820402_en-us_topic_0000001170463690_li16476172216182">The fingerprint algorithm is used to calculate two different fingerprints for each data block.</li>
     <li id="EN-US_TOPIC_0000002200066717__en-us_topic_0000002164820402_en-us_topic_0000001170463690_li19124111981818">In the fingerprint repository, the two fingerprints are compared in sequence. If no fingerprint is matched for a data block, the data compression process starts. Otherwise, the data block has duplicate data, and the duplicate data block is deleted.<p id="EN-US_TOPIC_0000002200066717__en-us_topic_0000002164820402_en-us_topic_0000001170463690_p1561513339181"><a name="EN-US_TOPIC_0000002200066717__en-us_topic_0000002164820402_en-us_topic_0000001170463690_li19124111981818"></a><a name="en-us_topic_0000002164820402_en-us_topic_0000001170463690_li19124111981818"></a>The fingerprint repository records the mappings between fingerprints of data blocks and storage locations of data blocks.</p></li>
    </ol>
   </div>
   <div class="section" id="EN-US_TOPIC_0000002200066717__en-us_topic_0000002164820402_en-us_topic_0000001170463690_section1998131119142">
    <h4 class="sectiontitle">Data Compression</h4>
    <p id="EN-US_TOPIC_0000002200066717__en-us_topic_0000002164820402_en-us_topic_0000001170463690_p97171381420">After deduplication, data blocks need to be compressed. The <span id="EN-US_TOPIC_0000002200066717__en-us_topic_0000002164820402_text16501428347">product</span> uses the compression algorithm to compress data blocks into smaller data blocks, and then combines and writes them into disks. During compression, the data that is difficult to be compressed is compressed using the Huawei-developed dedicated compression algorithm, and other data is compressed using the general compression algorithm, as shown in <a href="#EN-US_TOPIC_0000002200066717__en-us_topic_0000002164820402_en-us_topic_0000001170463690_fig75012522013">Figure 2</a>.</p>
    <div class="fignone" id="EN-US_TOPIC_0000002200066717__en-us_topic_0000002164820402_en-us_topic_0000001170463690_fig75012522013">
     <a name="EN-US_TOPIC_0000002200066717__en-us_topic_0000002164820402_en-us_topic_0000001170463690_fig75012522013"></a><a name="en-us_topic_0000002164820402_en-us_topic_0000001170463690_fig75012522013"></a><span class="figcap"><b>Figure 2 </b>Data compression process</span><br><span><img class="eddx" id="EN-US_TOPIC_0000002200066717__en-us_topic_0000002164820402_en-us_topic_0000001170463690_image105013521115" src="en-us_image_0000002200061565.png"></span>
    </div>
   </div>
  </div>
  <div>
   <div class="familylinks">
    <div class="parentlink">
     <strong>Parent topic:</strong> <a href="en-us_topic_0000002164825706.html">Data Deduplication and Compression</a>
    </div>
   </div>
  </div>
 </body>
</html>